fix(dcutr): handle empty holepunch_candidates #5583

stormshield-frb · 2024-08-30T14:32:09Z

Description

A few months ago, we were experiencing from time to time some weird failures with DCUTr. After some research to find out the problem, it was a race condition : sometimes identify must be a little bit slow and the DCUTr handler is created before an identify event is received. Alone, this is not necessarily a problem. But if this race happens when the holepunch candidates list of DCUTr is empty, then DCUTr will always fail for this connection. Indeed, when receiving an new relayed established connection, DCUTr will create an Handler for this connection which will be responsible to make the hole-punching. However, the candidates that are used are the one known at Handler instantiation, so any future updates about NewExternalAddrCandidate will not be forwarded to the Handlers.

This PR is the upstream of the fix we did several months ago and we did not encountered any particular problem with it since.

Timetable of the problem

Peer 1	Time	Peer 2
Connected to relay	0.416
Receive identify from relay	0.420
	0.646	Connected to relay
	0.650	Dial Peer 1 through relay
	0.655	Connected to Peer1 through relay
	0.655	Create DCUTr handler with no hole-punch candidates
Connected to Peer 2 through relay	0.657
	0.659	Receive identify from relay (first observed_addr)
DCUTr fail `OutboundError(NoAddress)`	0.663
	0.664	DCUTr fail `InboundError(UnexpectedEof)`

Notes & open questions

I have put some TODOs about the potential merging of self.attempts += 1. When inbound, self.attempts is incremented when starting an handshake, however, when outbound, self.attempts is incremented at the "new outbound substream" request. Before I don't think it was a problem, but now that we do not necessarily trigger an handshake if there is no hole-punch candidates, I think we might was to increment self.attempts only when effectively starting the handshake. What do you think ?
It is noted in the log when starting a new handshake that, if the corresponding stream (inbound_stream or outbound_stream) was not empty, then we replace the handshake. There is warn level log stating New inbound/outbound connect stream while still upgrading previous one. Replacing previous with new. However, when reading the code of the FuturesSet::try_push method and then the FuturesMap::try_push method (which is used inside), the future pushed never replaces any old one when capacity is reached, it just returns an error. So what do you think should be done ? Should be actually replace the old with the new like the log says ? Or should we not replace the old with the new and update the log to say that the new one was dropped ?

Change checklist

I have performed a self-review of my own code
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
A changelog entry has been made in the appropriate crates

stormshield-frb added 2 commits August 30, 2024 15:46

fix(dcutr): handle empty holepunch_candidates

0129730

set PR number

24cca4a

dariusc93 requested a review from jxs August 30, 2024 18:53

dariusc93 and others added 3 commits September 4, 2024 10:17

Merge branch 'master' into fix/dcutr-handle-empty-holepunch-candidates

1e85a11

Merge branch 'master' into fix/dcutr-handle-empty-holepunch-candidates

d02cd9a

Merge branch 'master' into fix/dcutr-handle-empty-holepunch-candidates

0adb765

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dcutr): handle empty holepunch_candidates #5583

fix(dcutr): handle empty holepunch_candidates #5583

stormshield-frb commented Aug 30, 2024 •

edited

Loading

fix(dcutr): handle empty holepunch_candidates #5583

Are you sure you want to change the base?

fix(dcutr): handle empty holepunch_candidates #5583

Conversation

stormshield-frb commented Aug 30, 2024 • edited Loading

Description

Timetable of the problem

Notes & open questions

Change checklist

stormshield-frb commented Aug 30, 2024 •

edited

Loading